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Estimating the unknown number of classes in a population has 
numerous important applications. In a Poisson mixture model, the 
problem is reduced to estimating the odds that a class is undetected 
in a sample. The discontinuity of the odds prevents the existence 
of locally unbiased and informative estimators and restricts confi- 
dence intervals to be one-sided. Confidence intervals for the number 
of classes are also necessarily one-sided. A sequence of lower bounds 
to the odds is developed and used to define pseudo maximum likeli- 
hood estimators for the number of classes. 

1. Introduction. The species problem has a wide variety of applications 
[3]. The term "species" has been endowed with many meanings such as 
taxa, words known by an author and expressed genes in a tissue. Consider 
a population of infinitely many individuals belonging to c distinct classes 
labeled by i = 1, 2, . . . , c. In a sample of S individuals, Y{ individuals belong 
to the ith. class. The ith class is not detected when Yi = 0. Estimating the 
number of classes c from those Yi > is a well-known difficult problem. For 
example, I. J. Good pointed out that U I don't believe it is usually possible 
to estimate the number of species, but only an appropriate lower bound to 
that number. This is because there is nearly always a good chance that there 
are a very large number of extremely rare species" [3] . 

In the literature, Yi is usually treated as a Poisson random variable 
with mean \ and the \ are assumed to follow a mixing distribution P 
over (0, oo). The Yi arise as a sample from gp(y) = f e~ x \ y /(yl) dP(X). 
There are n = Y^i=i I(Yi > 0) detected classes in the sample. Because n ~ 
binomial(c, 1 — gp(0)), the maximum likelihood estimator (MLE) of c given 
P is the integer part of c{9) =n(l + 9), where 9 = gp(0)/{l — gp(0)} is the 
odds that a single class is undetected in the sample. When an estimator 9 for 
9 is substituted into c{9), we obtain a pseudo MLE for c [11]. The problem 
of estimating c is thereby reduced to that of estimating 9. 
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The idea of the nonexistence of inferential procedures is not unfamiliar 
to statisticians (e.g., [1, 10, 13, 17]). To make Good's point concrete, we 
will show that the odds 6 is discontinuous. There are several consequences: 
no locally unbiased and informative estimator for 9, no genuine two-sided 
confidence intervals and arbitrarily bad informativity when reducing bias to 
zero. However, because 6 is lower semicontinuous, there exist nonparamet- 
ric lower confidence limits. A sequence of closed- form lower bounds to 9 is 
developed. Similar results concerning inference on the number of classes c 
hold. The upper confidence limits for c are often infinite. The estimators for 
the lower bounds to 9 yield estimators for lower bounds to c. 

This article is organized as follows. The mixture model will be described 
in Section 2. In Section 3 the discontinuity of 9 and its consequences will 
be investigated. In Section 4 we will demonstrate the lower semicontinuity 
of 9, construct lower confidence limits and propose its lower bounds. The 
problem of estimating c will be considered in Section 5. In Section 6 an 
epidemiological application and a genomic application will be studied. In 
Section 7 extensions to related problems will be discussed. All proofs are 
contained in Section 8. 

2. The mixture model. Let n y = Y^i=i = v)- Since the Yi arise from 
gp(y), (riQ,ni, . . .) follows a multinomial density. When no, the number of 
classes in the population unobserved in the sample, is replaced with c — n, 
this yields 

| oo 

afeP> ° (c-»)!E 1 .! 8r|01 S i?(4 

This likelihood can be written as pi(c, P) = P2(c, P)ps(n, P), where p 2 {c, P) 
is the density of n and ps(n,P) is the conditional density of (rai,ri2, . • .) 
given n: 

p 2 {c,p) = ( c i \cff n mi- gP m\ 



( d\ n! TtJ 9p X \ 
P3(n,P) = =55 : [[< - j-r \ 



The likelihood of n is binomial, as indicated before, and depends on both 
c and 9. The conditional likelihood has no dependence on c, but contains 
most of the information about P. It can be analyzed as follows. Condition- 
ing on n, those Yi > follow a zero-truncated mixture gp(x)/{l — gp(0)}. 
Rewrite them as Xi,X%, ■ ■ ■ ,X n . Let f\(x) = \ x / {x\(e x — 1)} and fg(x) = 
J f\(x)dQ(X), where 

(2.1) dQ(X) = (l-e- x )dP(X)/ J(l-e~ x )dP(\). 
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Because /q(x) = gp(x)/{l — gp(0)} for x > 1, the Xi can be treated as a 
sample from a mixture of zero-truncated Poisson densities. The joint density 
of the Xi is 

n oo 

4 n) (x!,x 2 , . . . ,x n ) = J] fQ(xi) = II fq^)- 

1=1 x=l 

Note that Q has no mass on zero. The nonparametric mixture model refers 
to T = {/q :Q G Q}, where Q contains all legitimate mixing distributions. 

Lemma 2.1. J 7 is identifiable in the sense that Jq = f G yields Q = G. 

Finally, note that n plays a dual role as the number of detected classes 
and the sample size of the Xi , and c also plays a dual role as the parameter of 
interest and the "sample size" of the Yi. Our asymptotic results concerning 
^-estimation will be based on n becoming infinite, which implies that c goes 
to infinity as c = E(n) x (1 + 9), a common natural practice in the literature 
that deals with nonstandard problems with integer parameters. However, 
our key result for c-estimation will be finite-sample in nature, so that no 
asymptotics are required. 

3. Discontinuity. We will show that estimating 9 is difficult in several 
aspects. 

We write 9 = 9(/q) because of Lemma 2.1 and the fact that 9 = J(e A — 
l) -1 dQ(\). As the mass of Q at zero is nonidentifiable and mass near zero 
is nearly undetectable, we have the following result. 

Lemma 3.1. 9 is Hettinger discontinuous at any /q G J- . 

The discontinuity excludes the existence of estimators that have desirable 
properties in terms of unbiasedness and informativity [13] . 

An estimator 9 n for 9 is locally unbiased at Jq if there exists e > such 
that 

sup{\E G n ) - 9{f G )\: f G G B(f Q ,e)} = 0, 

where E G (-) means taking expectation given G G Q and where B(fQ,e) is a 
ball, 

B(f Q ,e) = |/ G G T: f] \ff{x) - f G /2 (x)} 2 < e 2 |. 

The estimator 9 n is locally informative at fn if there exists K(fg) > such 
that 

limsupsup{£ G (^) : fc e B(f Q ,e)} < K(f Q ). 
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An estimator (sequence) 9 n for 9 is locally asymptotically unbiased (l.a.- 
unbiased) at fq with the rate of convergence r(n) > n" 1 / 2 if there exists 
e > such that 



lim lim sup sup 



Eg 



I, 



?n. ~ 0(f G ) 

r(n) 



■ fa e B(f Q ,eri 



-1/2^ 



0, 



where l m (z) = z — sign(z) max(|.z| — m, 0). At /q, is locally asymptotically 
informative {I. a. -informative) if there exist e > and K(fg) > such that 

k-0(faY 



lim lim sup sup < E^j 



"(n) 



:/ G eB(/<3,»- 1/2 )|<%). 



Theorem 3.1. 9 has no locally unbiased and locally informative esti- 
mator. 

Theorem 3.2. 9 has no I. a. -unbiased and I. a. -informative estimator. 

Although bias is often the main concern, the discontinuity of 9 will chal- 
lenge our endeavor to reduce bias as a method for improving estimation 
accuracy. 

Theorem 3.3. If {9 nt m}m=i ^ s a sequence of estimators for 9 with fixed 
n, such that lim^oo \E G (Q n>m ) - 0(/g)| = for f G in B(f Q ,e ) with e > 0, 
then 

m lm^sup{£ G (02j : / G 6 B(f Q ,e)} = oo, e > 0. 

Our ability to construct two-sided confidence intervals is also challenged. 
If, somewhere in T a confidence interval has a finite upper confidence limit 
with probability one, then, somewhere in its coverage probability is zero 
[10]. 

Theorem 3.4. If [9 n j,9 ntU ] is a confidence interval, then 
sup{Pr Q (oo i [§ n>h § njU ]) :QgQ} = 1 

implies that 

mf{Pr Q (9(f Q ) £ [9 n j,9 n , u ]) : Q £ Q} = 0. 

One can also consider the possibility that 9 n ^ u is an upper confidence 
limit, that is, 

(3.1) mf{Pr Q (^ !U >0(/ Q )):Q£Q}>l-a, a £(0,1). 

If the advertised confidence level is guaranteed, then the upper confidence 
limit will be infinite with large probability. 



Theorem 3.5. in^PrQ^^ = oo) : Q £ Q} > 1 - a. 
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4. Lower bounds. We will construct lower bounds to 9. 
Although 9 is discontinuous, it admits lower bounds, because of the fol- 
lowing. 

Lemma 4.1. 9 is Kolmogorov lower semicontinuous at any fg € T . 

From [10], given e > and a distribution function F , with Fq(x) = 
J2i=i /q(^); the e- lower envelope of 9 at Fq is 

(4.1) 9(F ; e) = mf{9(f Q ) : d(F Q ,F ) <e,f Q e F}, 

where oI(Fq ,Fq) is the Kolmogorov distance of distribution functions Fq and 

d(F , F *) = sup{|F (x) - F *(x)| : x € (-oo, oo)}. 

A conservative 1 — a lower confidence limit is 9(F n ;e n ), where F n {x) = 
X)i=l fn(i), fn{x) = n x /n and e n is the 1 — a quantile of the Kolmogorov dis- 
tance of uniform(0, 1) and the empirical distribution of n random variables 
from it. 

Theorem 4.1. sup{Pr Q (0(F n ;£ n ) < 9(f Q )):Qe Q}>l-a. 

Calculating 9(F n ;e n ) requires the solution of the optimization problem 
in (4.1). Given a grid {£j}/=i C (0, oo) with Q = E/=i 7r j^(^)> where (5(A) 
is a degenerate distribution at A, the discretized version of (4.1) is a linear 
program, due to the use of the Kolmogorov distance and the linearity of 
6(f Q ) and Fq(x) in Q. 

There are alternative lower bounds to 9. Let n(x) = J X x d$(A) be the xth 
moment of a measure <& over (0, oo) with d<l?(A) = (e A — l)" 1 dQ(X). Note 
that 

,u(O) = 0(/q), fi(x)=xlf Q (x), x = l,2,.... 
When Lfc = + j))f J=1 is positive definite, with a k = (^(j))j = i, define 

(4.2) 9 k = 9 k (f Q )=a' k T^a k . 

Theorem 4.2. Let x(Q) be the number of support points ofQ. IfxiQ) < 
oo, then 9\ < ■ ■ ■ < 6> x( q) = 9(f Q ), and 9\ < ■ ■ ■ < lim k _ too 9 k = 9(f Q ) other- 
wise. 

The approximation bias refers to 9 k — 9, whose absolute value decreases 
in k. The inferential challenge arises because the variance in ^-estimation 
increases in k. 

To find the condition under which the lower bound 9 k is Fisher con- 
sistent, we consider partitioning the mixture model T into "sieves" T k = 

{f Q --x(Q) = k}. 
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Theorem 4.3. k (f Q ) = 6(f Q ) iff Q e T k ; 9 k (f Q ) < 9(f Q ) if f Q & \JT=k^j- 

The lower bound 9 k is a functional that approximates 9. A pre-existing 
nonparametric estimator for c can also define an approximation functional 
to 9. For example, from [6, 7, 9] one recognizes, with = J2T=i x% fQi x )i 

1-/q(1) 
1-/q(1M/q)/ S ?(/q) 

/q(1){s 2 (/q) " + " /o(l)}{ai(/ ) - /q(1)} 



QclUq) 



{ sl (f Q )-f Q (i)y 



^ (/q) " i-/odW(/o) " L 

Unlike the 0/., it is not easy to specify the conditions under which each one 
is Fisher consistent, except that #d_r = 9cl = 9cb = 9 when Q = (5(A). 

The lower bound is the odds of a mixing distribution from which the 
derived measure has the same first 2k + 1 moments as $ derived from Q. 

Theorem 4.4. For /c < x(Q)> there is a mixing distribution Q k with 
x(Qk) = k, 9(f Qk ) = 9 k , f Qk ( x ) = f Q ( x ), x = 1,2,..., 2k. 

To estimate 9 k , we consider the empirical moments fi(x) = x\f n {x) and 
their matrices a k = (fi(j))j = i and T k = (fi{i + j))f J=1 . For k < \ n < oo, de- 
fine 9 k = a'jXfc where \ n = max{fc : \Tj\ > 0,j = 1, 2, . . . , 

Theorem 4.5. ^4s n goes to infinity, Xn estimates x(Q) consistently 
when x(Q) < 00 • For finite k < x(Q)> as n 9 oes to infinity, 9 k exists almost 
surely and n l / 2 (9 k — 9 k ) converges to a zero-mean normal distribution. 

Finally, an estimator for an approximation functional can also be calcu- 
lated from /~ with Q being the nonparametric MLE [12, 14]. Note that 

Q{fq) — ^ x (Q)(/q) * s ^ ne mos t greedy one among the 9k(fq) i n terms of 
approximation bias reduction. 

5. Inference on c. We turn to unconditional inference on c. 
As c is identifiable given 9 from p2(c,P) and 9 is identifiable, it follows 
that c is identifiable. 

Let c u be a (1 — a)-level upper confidence limit for c, that is, 



(5.1) 



Pr c ,p(c u >c)>l-a Vc>l,VP. 
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Theorem 5.1. For (c,P), Pr Cj p(c n = oo) > A(c, 1 - <?p(0)) - a, where 
A(c, g) = ±vnn[ ( *) <f (1 - e )~ ^^}- 

The conclusion in Theorem 5.1 is slightly weaker than that in Theo- 
rem 3.5, as the distribution of n retains a small amount information about c 
from the testing affinity (see, e.g., [10]) of binomial(c, g) and binomial(c', g'), 
with g= 1 — gp(0), such that c'g' approaches eg when c' goes to infinity. The 
bound A(c, g) — a in Theorem 5.1 depends on c and P through the func- 
tional g. From the fact that A(c, 0) = 1, we can find a pair of (c, g) such 
that A(c, g) is arbitrarily close to one. For a fixed c, when the probability 
g of a class of being detected increases, the probability of the upper confi- 
dence limit being infinite will decrease. For an extremely large g, one might 
have a negative value of A(c, g) — a. In particular, by Stirling's formula 
c! « (2vrc) 1 / 2 (c/e) c , one has A(c, 1) = e" c c7(c!) « ^vfc)" 1 / 2 . For a = 0.05, 
A(c, 1) > a for 1 < c < 64 and ^4(c, 1) < a for c > 64. Although the testing 
affinity A(c, g) is a function of both c and g, for c relatively large it will 
change little in c for a fixed g. There exist lower bounds for A(c, g) that are 
functions of g only, for example, A(c, g) > 1 — 2~ 1 £>(1 — £>)~ 1 ^ 2 [18]. 

Note that = n(l + 6^) is a pseudo MLE for c and is a consistent es- 
timator for Ck = c(l + 0k) /(I + 6) < c. In particular, ci = n + n 2 /(2n2) is 
given in [4]. The asymptotic variance of increases in c, while that of logc^ 
decreases in c because both c -1 / 2 (cfc — c^) and c 1//2 (logCfc — logc^) converge 
to zero-mean normal distributions as c goes to infinity. 

6. Applications. We consider two applications. The first (cholera) con- 
cerns an epidemic of cholera in a village in India [2, 15]. There were house- 
holds affected by cholera but having no case. Note that n x is the number of 
households having x cases, with n\ = 32, ri2 = 16, n% = 6 and = 1 among 
n = 55 identified infected households with S = 85 cholera cases. The sec- 
ond (EST) concerns S = 2586 expressed sequence tags (ESTs) from which 
n = 1825 genes were found [14, 15]. An EST is a partial sequence identifying 
an mRNA and ESTs are generated by sequencing randomly selected clones 
in a cDNA library made from an mRNA pool. There were expressed genes 
from which no EST was generated. Note that n x is the number of expressed 
genes from which x ESTs were generated, with n\ = 1434, n-2 = 253, 723 = 71, 
714 = 33, n.5 = 11, iiq = 6, ng = 3, n x = 2 for x € {7, 10, 11, 16} and n x = 1 for 
x£ {9,12,13,14,23,27}. 

The estimates for approximation functions are shown in Table 1 , together 
with the lower 5% quantiles of estimates from 400 model-based resamples, 
using the nonparametric MLE Q. All estimates are comparable in cholera, 
as x(Q) = 1- The pre-existing estimates are not comparable in EST, as 



<s 
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x(Q) > 1- The linear program yields the conservative 95% nonparamet- 
ric lower confidence limits: 9(F n ;e n ) = 0.250 with n = 55 and e n = 0.180 
in cholera; 9{F n ;e n ) = 1.408 with n= 1825 and e n = 0.032 in EST. These 
bounds are considerably smaller than the resampling quantiles. If is 
used to estimate 9 in cholera, then a pseudo MLE for the number of infected 
households is 88. If 02(fn) 1S use d to estimate 9 in EST, then a pseudo MLE 
for the number of expressed genes is 7392. 

To learn something about the approximation bias, we treat Q as the 
true distribution and read across the rows labeled in Table 1, with the 
largest value of the 9k being 9(fn). The other pre-existing approximation 
functionals are not better than the 9^ in EST because 9dr/9 = 0.41, 9ql/9 = 
1.46 and 9 C b < 0. 

7. Discussion. Conditioning on the sample size S, the Y{ arise from a 
multinomial distribution with index c and probabilities pi = Ai/X)j=i^j- 
The multinomial model is more cumbersome analytically as the Y± are not 
independent. Just as in contingency table analysis using log-linear models, 
a Poisson-based analysis usually gives quantitatively similar or identical re- 
sults, even for fixed size samples. 

Results similar to those developed here can be established for a multiple- 
population species problem modeled by truncated mixtures of multivariate 
densities [16]. There are also lower bounds that can be developed for the 
total number of classes. 

Estimating the population size by partially sampling a population is an- 
other important and difficult problem [5] . It could be investigated by means 
of various models of mixtures (e.g., binomial mixtures). Although the pop- 
ulation size is nonidentifiable nonparametrically, we claim that by adapting 
and extending the techniques used here, we can show that confidence inter- 
vals for the population size must be one-sided, but identifiable lower bounds 
to the population size exist. 



Table 1 

Estimates and the lower 5 % empirical quantile of resample estimates 
(cholera: 1st block, EST: 2nd block) 







Ocl 


OcB 


6»i 


6*2 


03 


6*4 


05 


fn 
*Q 


0.593 
0.608 


0.544 
0.608 


0.484 
0.608 


0.582 
0.608 










5% quantile 


0.407 


0.410 


0.407 


0.412 










fn 

5% quantile 


1.245 
1.245 


4.462 
4.488 


-1.395 
-1.391 


2.227 
2.228 


2.849 
3.051 


3.000 
3.070 


3.071 
3.072 


3.404 
3.072 


1.120 


3.222 


-1.755 


1.964 


2.432 


2.446 


2.455 


2.455 
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8. Proofs. 

Proof of Lemma 2.1. Let d^(X) = A(e A - dQ(X). As Ae _A <e _1 , 

{1 - g P (0)} J e xt dtf (A) = J Ae"^ 1 "*) dP{\) < (1 - ^"V 1 < 1 

for t < 1 — e _1 . The existence of a moment generating function (m.g.f.) im- 
plies that ^ is uniquely determined by its identifiable moments J X x d*5f(\) = 
(x + 1)!/q(x + 1), x > 0. The measure $ and the distribution Q are identi- 
fiable. □ 

The total variation distance T(ip,4>) and the Hellinger distance h(ijj,4>) 
between two densities ip(x) and <f>(x) over K K with Borel field B are given 
by 

t(i(>,4>)= [ |V(x)-^(x)|=2sup{|Pr^(B)-P^(S)|:BGB} ) 

(8.1) 

M^^ = {/[V' 1/2 (^-^ 1/2 (^] 2 } 12 . 

Note that T(ij>,(j>) and h(ijj,(f)) satisfy 

(8.2) / 1 2 (^0)<t(V,0)<2/ 1 (V,^. 

We introduce a useful single-parameter submodel of T . Let tt(s) and r/(s) 
be two functions of s G (0, 1) with ir(s) G (0, 1) and rj(s) G (0, oo). Given Q, 
define 

(8.3) Q, = (l-7r(s))Q + 7r(s)^(a)). 
It is clear that 

oo 

(8-4) r(f Qs ,f Q ) = £ |/ Q .(x) - /q(x)| < 27r( S ), 

x=l 

(8.5) 0(/ Q J = (1 - tt( S ))0(/q) + 7r( S )0(/„ (a) ). 

Lemma 8.1. Given e > and /q £ J 7 , uj(e; 9, Jq,J-) = oo, where 
w(e; 8, f Q ,T) = sup{|0(/ Q ) - 0(/ G )| : / G G S(/ Q ,e)}. 

Proof of Lemmas 3.1 and 8.1. If 7r 2 (s) = r/(s) = s 2 in (8.3), then from 

(8.4) and (8.5), lim s ^ o 0(/oj = oo and lim s ^ t(/q s ,/q) = 0. By (8.2), one 
has lim s ^o M/Qs> /q) = so that Lemmas 3.1 and 8.1 hold. □ 
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Proof of Theorems 3.1 and 3.3. Under Lemma 8.1, Theorem 3.1 
and Theorem 3.3 hold because of Theorem 1 and Theorem 3 in [13], respec- 
tively. □ 

Proof of Theorem 3.2. Let 6 n be l.a.-unbiased and l.a.-informative 
for 9 with the rate of convergence r(n) > iiT 1 ! 2 . Let s = 1/n, 7r(n _1 ) = 
e 2 /(2n 2 ) and ^(n" 1 ) = l/(r(?i)n 3 ) in (8.3). Let G n = Q 1/n and 

W - ®n-e{f Q ) _ 0n-e(f Gn ) _ 0(f Gn )-6(f Q ) 
vv n — T\ j — 7— T , "n — 7~\ • 

r[n) r(n) r[n) 

Note that linin^oo n7r(n _1 ) = and limn^oodn = 00, and that f Gn 6 B(/q, 
en" 1 ) C B{f Q ,en- 1 ' 2 ) because h 2 (f Q , f Gn ) < r(/ Q , f Gn ) < e 2 n~ 2 from (8.2), 
(8.4) and (8.5). By investigating the proof of Theorem 2 in [13], with 

u m , n = 2{E Q [l 2 m {W n )] + E Gn [l 2 m (Z n )\ + 2E Q \l m (W n )\ ■ d n + d 2 n }, 

due to the l.a.-informativeness and l.a.-unbiasedness, we have 

(8.6) \EQ[l m (Z n )]\/d n = 1 + o(l/d n ) as n — > 00 and then vn — > 00, 

(8.7) |^Q[Z w (^n)]|/d n < |^[Zm(^n)]|/^n + ^(/^^/ci) " ^fn/^n- 
Because EQ\l m (W n )\ < EQ 2 [l 2 n {W n )], by the l.a.-informativeness, 

(8.8) u m n /d 2 l = 2 + o(l) as n — > 00 and then m — > 00. 
For large n, from the proof of Lemma A.l in [10], we have 

h 2 {f { Q\} { al) = 2[1 - {1 - h 2 (f Q J Gn )/2} n ] 

(8.9) 

«nh 2 (/ Q ,/ Gn )< e >- 
By the l.a.-unbiasedness, from (8.7), (8.8) and (8.9), it follows that 

\EQ[l m (Z n )]\/d n = o(l) as n — > 00 and then m — > 00, 

which is in contradiction to (8.6). This implies that Theorem 3.2 holds. □ 

PROOF of Theorem 3.4. Given z > #(/q), let tt(s) = s and r/(s) = 
s /{ z ~ 0(/q)} in (8-3). As lim s _ t(/q s ,/q) = and \im s ^ 6(f Qs ) = z from 
(8.4) and (8.5), {(f Q ,6(f Q )) :/ Q £f} is dense in {(f Q ,z) :f Q eT,z> 9(f Q )}. 
Theorem 3.4 holds by applying Theorem 2.1 from [10]. □ 

Proof of Theorem 3.5. Let tt(s) = s and n(s) = s 2 in (8.3). Because 

r\f { £ 4 n) )/8 < 1 - {1 -r(f Qs J Q )/2r 
<l-(l-s) n 



ESTIMATING THE NUMBER OF CLASSES 



11 



from Lemma A.l in [10] and (8.4), we conclude that lim s _ > o r (/Q^j fn^) = 0. 
From the condition in (3.1), the definitions in (8.1) and the fact that 

|Pr Q (4,« > 0(/qJ) - Pr Qs (0n,u > 9(f Qs ))\ 
<snp{\Pr Q (B)-Vr Qs (B)\:BEB}, 
we have by the triangle inequality, 

Pr Q (9 n , u > d(f Qs )) + r(/£\ f% ] )/2 > Pr Q .(^„ > 9(f Q J) > 1 - a. 

By letting s go to zero, Prg(0 njU = oo) > 1 — a as lim s ^o #(,/q s ) = oo from 
(8.5). This inequality holds for all Q, which implies that Theorem 3.5 holds. 

□ 



m ' 



PROOF of Lemma 4.1. Let Q and G7 m be in Q with lim m _^oo d(i*b, 
Fq) = 0. As a function of Jq(x), x = 1, . . . , 2/c, 9 k {fo) is continuous, so it 
is continuous in Fq on its domain. If 9 k {fo) exists, then 9k(fa m ) will exist 
for sufficiently large m and 9 k (f Q ) = lim^oo 9 k (f Gm ) < liminfm^oo 9(f Gm ). 
Because 



9 x ( Q ){f Q ) < ljminf 0(/ G J, x(Q) < oo, 

lim fe (/ Q ) < hminf 9(f G J, X (Q) = oo, 

k — >-oo m. — >oo 



0(/q) : 

the odds 9(Jq) is lower semicontinuous. □ 



Proof of Theorem 4.1. This holds following application of (3.13) 
from [10] and Lemma 4.1. □ 

Write M > if a matrix M is positive definite. Given a sequence (^t(O), 
/i(l), . . .), define Hankel matrices H k = {^{i + j))i j=Q and H k = (fi(i + j + 
l))ij = Q for each A;. The following summarizes some results in the Stieltjes 
moment problem. 

Lemma 8.2. The sequence (//(0),//(l), . . .) of real numbers is the mo- 
ment sequence of a measure $ on (0, oo) if and only if \H k \ > and \H k \ > 
for k < x(^)> an d, when x{&) < °°; H k and H k have rank x($) f or k > 

Proof of Theorem 4.2. Write r^+i and rri as partitioned matrices, 



r 



fc+1 



1 fc+1 



T v 



)' w 



Tfc b 
b 1 n{2k + 2) 

where b = (n(k + 2), fi(k + 3), . . . , fi(2k + 1))' , w = (fi(2k + 2) - b'T^b)' 1 
^b and T = T^ 1 + w ■ r^ 1 66T fe 



w-T^b and T = T^ + w-r^ 1 bbT^ 1 . Note that \H k \ = (-l) k \r k \{^(k + 
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1) — a' k T k l b) because H k can be obtained, by exchanging the rows k times, 
from 

>(fc + l) V 

a k r fc _ 

As |r fc+ i| = |r fc |(/i(2fc+2) -b'T k l b), it follows that w = \T k \ ■ irfc+il" 1 . Write 

e k+l = (4, fi(k + 1)) • • ( a ' k ,fi(k + 1))' 

= a' k Ta k + 2/i(fc + 1)4^ + w ■ fi 2 (k + 1) 
= a' k (T- 1 +wT- 1 bb'T- 1 )a k 

- 2w ■ n(k + 1) • 4 r fc 1 b + w ■ fi 2 (k + 1) 
= a' k T k l a k + w ■ (n(k + 1) - a' fc r^ l bf 

= 9 k + \T k \ ■ iTk+i]- 1 ■ {\fi k \(-i)- k \r k \- 1 } 2 . 

This means that if 9 k+ \ exists, then 9 k+ \ and 9 k satisfy 

9 k +\ = 9 k + \H k \ 2 ■ \T k \ 1 • \T k+ i\ 1 . 

Note that H k > when T k+ i > so that 9 k < 9 k+ \. 

When x(Q) < °°> write \H x rq-\ | = |r x (Q) | • (/i(0) — 9 x {n\). P rom Lemma 8.2, 
\H x (n\ \ = 0, which means that ^(0) = x (q) as |r x (Q)| > 0. When x(Q) = °°) 
write \H k \ = ^1(^(0) — 6 k ). The sequence 9 k is strictly increasing in k and 
bounded above by /x(0) so that £ = lim^oo 6>fc exists. Consider (£, /x(l), /i(2), . . .) 
associated with Hankel matrices H k £ and H k ^. Note that \H k ^\ > because 
H k £ = H k , and l-fffc^l > because 9 k < £ and l-Hfc^l = |Pfc|(£ — @k)- Prom 
Lemma 8.2, (£,//(l),//(2), . . .) is a moment sequence of a measure on 
(0,oo) with x(^) = oo. Let dtf(A) = Ad$(A) and cM^(A) = A<%(A). Note 
that ^ and have the same moment sequence and that ^ has an m.g.f. 
from the proof of Lemma 2.1. This implies that ^ = ^g, so that $ = and 

e=/i(o)- □ 

Proof of Theorem 4.3. From Lemma 8.2, it follows that for k < 
x(Q)i Lfc > as T k is identical to the Hankel matrix H k —i of the moments 
of a measure with d^(X) = \d&(\). This observation and Theorem 4.2 
imply that Theorem 4.3 holds. □ 

Proof of Theorem 4.4. Let H k z be obtained from H k with /x(0) 
replaced by z E 7Z. If T k > 0, then \H kjZ \ = \T k \(z — 6 k ). When 9 k exists, 
because > and 0i < 9 k it follows that \Hi£ k \ = \Ti\(9 k — 9i)>0 for i = 1, 
. . . , k — 1. In addition, \H k ^ k \ = and H k -\ > 0. From [8], there exists a 
measure & k with x(^fc) = k such that / d& k (X) = 9 k and / \ x d& k (X) = fi(x), 
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x = 1, . . . , 2k. With & k having no mass at zero, Theorem 4.4 holds by letting 
Q k = (e x -l)d$ k (\). □ 

Proof of Theorem 4.5. By the strong law of large numbers, the em- 
pirical moments, moment matrices and their determinants converge almost 
surely, implying the consistency of Xn and the almost sure existence of 9 k 
for k < x(Q) as n goes to infinity. The delta method yields the asymptotic 
normality of 9 k as ra 1 / 2 (/ nj fc — Jq.u) converges to a multivariate normal dis- 
tribution, where f Q)k = (/q(1), . . . , /q(2/c))' and f n>k = (/„(1), . . . , f n (2k))'. 
□ 

Proof of Theorem 5.1. With n given by (2.1) and Q s used to show 
Theorem 3.5, let Q = k(P), P s = k~ 1 (Q s ) and c s be the integer part of 
c{l + 0(f Q J}/{l + e(f Q )}. Let t s = t( Pi (c s ,P s ), Pi (c,P)). It can be shown 
that 

l + 0(f Q ) €[Cs > Cs + 1) > 
lim c s = oo, 

s— >0 

c c 

^oi + e(f Qs ) = i + e(f Q y 

c 

|r a - r(p 2 (c s ,P s ), P2 (c, P))\ < r(ps(n, P s ),P 3 (n, P))p 2 (c, P). 

n=0 

Let g = 1 — <7p(0). Because the mean of p2(c s ,P s ) goes to that of P2(c, P) as 
s goes to zero, p2(c s ,P s ) tends to a Poisson density with mean eg, 

]imr(p 3 (n,P s ),p 3 (n,P)) = limr(f^J^) = 

and lim s _>o t s = lim s _ i .o t(p2(c s , P s ),p2(c, P)) = 2 — 2A(c, g). From the condi- 
tion in (5.1) and the definitions in (8.1), and because 

|Pr c ,p(c« > c s ) - Pt Cs ,p s (c u >c s )\ 

< sup{|Pr Cj p(fi) — Pr Csi p s (B)\ : B G B}, 

it follows by the triangle inequality that 

Pr c ,p(c„ > c s ) + r s /2 > Pr CsjPs (c u > c s ) > 1 - a. 

The proof is completed by letting s go to zero. □ 
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